17 research outputs found

    Implementazione, creazione e ottimizzazione di una pipeline per l'analisi biofisica su cluster a basso consumo energetico.

    Get PDF
    In questa tesi si è studiata l'efficienza computazionale di nodi di calcolo a basso consumo energetico per l'analisi biofisica, confrontati con nodi tradizionali. Questo lavoro è parte di un progetto per valutare la fattibilità dell'utilizzo di macchine a basso consumo energetico per calcolo ad alta performance. Lo scopo della ricerca è provare che l'utilizzo di cluster low power possa fornire una potenza di calcolo confrontabile con quelli tradizionali. Il sistema su cui si è concentrato il lavoro di tesi è uno dei metodi più recenti nella ricerca sulle mutazioni genetiche che sono cause di vari tipi di tumori: il sistema GATK-LODn. Nel corso della tesi è stata reimplementata una componente di questo metodo in una pipeline nel programma Snakemake, che ha permesso una gestione più accurata delle operazioni previste per ottimizzare l'esecuzione complessiva. Questa tesi prende in esame questo algoritmo di bioinformatica per valutare se è realmente possibile confrontare le capacità dei nodi low power con quelli tradizionali, in quanto questo richiede alte prestazioni computazionali, di memoria e capacità di storage. Nel primo capitolo saranno spiegati gli elementi del progetto. Sarà esposto il metodo GATK-LODn. Sarà poi descritta la parte del metodo che è stata reimplementata tramite Snakemake e saranno approfondite le capacità di questo strumento. Infine, sarà spiegato il significato di "nodo low power" e saranno descritte le caratteristiche dei nodi adoperati nelle analisi. Nel secondo capitolo sarà spiegato il funzionamento del programma, approfondendo i parametri utilizzati, e verranno evidenziati i passaggi necessari per un corretto uso del metodo. In più, saranno descritte le fasi dello studio statistico e sarà spiegata la tipologia di simulazioni effettuate. Infine verrano discussi i risultati finali più rilevanti per ciascuna regola della pipeline in termini di tempi di esecuzioni e memoria occupata

    Applicazione di un algoritmo d’apprendimento basato su sistemi fuori dall’equilibrio a dati di Genome Wide Association

    Get PDF
    Il fenomeno dell’apprendimento può essere studiato attraverso metodiche di Meccanica Statistica. A partire dalle Neural Networks è possibile descrivere il problema dell'apprendimento mediante un sistema di spin interagenti. Usando una descrizione all’equilibrio del sistema e sotto opportune condizioni, tale problema si dimostra computazionalmente complesso. Tuttavia, esistono algoritmi euristici in grado di risolvere lo stesso problema efficacemente. Si dimostra che questa apparente inconsistenza è dovuta al fatto che lo spazio delle soluzioni degli algoritmi euristici non coincida con quello atteso all’equilibrio. Utilizzando una distribuzione fuori dall’equilibrio è possibile realizzare l’algoritmo replicated focusing Belief Propagation (rfBP), i cui risultati in termini di performance computazionali e di natura delle soluzioni sono in linea con i risultati degli algoritmi euristici. Questo lavoro evidenzia come l’utilizzo integrato di modelli a Spin-Glass, grafi e Neural Networks siano in grado di creare una base teorica solida per lo sviluppo di algoritmi di machine learning originali e innovativi. Questo lavoro, inoltre, introduce una nuova libreria di C++ ottimizzata per il calcolo parallelo dell’algoritmo rfBP e applica tale algoritmo su dati di Genome Wide Association. Sono stati considerati campioni di genomi del batterio Salmonella, ospitati in diversi animali, ed è stato effettutato il training dell’algoritmo rfBP sull’insorgenza di mutazioni (Single Nucleotide Polymorphism, SNP), nel tentativo di determinare l’animale da cui essi sono stati ospitati. L’obiettivo di questa applicazione è capire come i genomi dei batteri siano influenzati dal proprio ospite animale e se è possibile evidenziare delle caratteristiche che permettano di risalire dalla sequenza di SNPs all’ospite. Questo lavoro mostra che l’algoritmo rfBP produce, su tali sequenze di SNPs, performance comparabili e superiori a quelli ricavati con le più comuni tecniche di Machine Learning

    Intraspecies characterization of bacteria via evolutionary modeling of protein domains

    Get PDF
    The ability to detect and characterize bacteria within a biological sample is crucial for the monitoring of infections and epidemics, as well as for the study of human health and its relationship with commensal microorganisms. To this aim, a commonly used technique is the 16S rRNA gene targeted sequencing. PCR-amplified 16S sequences derived from the sample of interest are usually clustered into the so-called Operational Taxonomic Units (OTUs) based on pairwise similarities. Then, representative OTU sequences are compared with reference (human-made) databases to derive their phylogeny and taxonomic classification. Here, we propose a new reference-free approach to define the phylogenetic distance between bacteria based on protein domains, which are the evolving units of proteins. We extract the protein domain profiles of 3368 bacterial genomes and we use an ecological approach to model their Relative Species Abundance distribution. Based on the model parameters, we then derive a new measurement of phylogenetic distance. Finally, we show that such model-based distance is capable of detecting differences between bacteria in cases in which the 16S rRNA-based method fails, providing a possibly complementary approach , which is particularly promising for the analysis of bacterial populations measured by shotgun sequencing

    Automated Prediction of the Response to Neoadjuvant Chemoradiotherapy in Patients Affected by Rectal Cancer

    Get PDF
    Simple Summary Colorectal cancer is the second most malignant tumor per number of deaths after lung cancer and the third per number of new cases after breast and lung cancer. The correct and rapid identification (i.e., segmentation of the cancer regions) is a fundamental task for correct patient diagnosis. In this study, we propose a novel automated pipeline for the segmentation of MRI scans of patients with LARC in order to predict the response to nCRT using radiomic features. This study involved the retrospective analysis of T-2-weighted MRI scans of 43 patients affected by LARC. The segmentation of tumor areas was on par or better than the state-of-the-art results, but required smaller sample sizes. The analysis of radiomic features allowed us to predict the TRG score, which agreed with the state-of-the-art results. Background: Rectal cancer is a malignant neoplasm of the large intestine resulting from the uncontrolled proliferation of the rectal tract. Predicting the pathologic response of neoadjuvant chemoradiotherapy at an MRI primary staging scan in patients affected by locally advanced rectal cancer (LARC) could lead to significant improvement in the survival and quality of life of the patients. In this study, the possibility of automatizing this estimation from a primary staging MRI scan, using a fully automated artificial intelligence-based model for the segmentation and consequent characterization of the tumor areas using radiomic features was evaluated. The TRG score was used to evaluate the clinical outcome. Methods: Forty-three patients under treatment in the IRCCS Sant'Orsola-Malpighi Polyclinic were retrospectively selected for the study; a U-Net model was trained for the automated segmentation of the tumor areas; the radiomic features were collected and used to predict the tumor regression grade (TRG) score. Results: The segmentation of tumor areas outperformed the state-of-the-art results in terms of the Dice score coefficient or was comparable to them but with the advantage of considering mucinous cases. Analysis of the radiomic features extracted from the lesion areas allowed us to predict the TRG score, with the results agreeing with the state-of-the-art results. Conclusions: The results obtained regarding TRG prediction using the proposed fully automated pipeline prove its possible usage as a viable decision support system for radiologists in clinical practice

    A sex-informed approach to improve the personalised decision making process in myelodysplastic syndromes: a multicentre, observational cohort study

    Get PDF
    Background Sex is a major source of diversity among patients and a sex-informed approach is becoming a new paradigm in precision medicine. We aimed to describe sex diversity in myelodysplastic syndromes in terms of disease genotype, phenotype, and clinical outcome. Moreover, we sought to incorporate sex information into the clinical decision-making process as a fundamental component of patient individuality. Methods In this multicentre, observational cohort study, we retrospectively analysed 13 284 patients aged 18 years or older with a diagnosis of myelodysplastic syndrome according to 2016 WHO criteria included in the EuroMDS network (n=2025), International Working Group for Prognosis in MDS (IWG-PM; n=2387), the Spanish Group of Myelodysplastic Syndromes registry (GESMD; n=7687), or the Dusseldorf MDS registry (n=1185). Recruitment periods for these cohorts were between 1990 and 2016. The correlation between sex and genomic features was analysed in the EuroMDS cohort and validated in the IWG-PM cohort. The effect of sex on clinical outcome, with overall survival as the main endpoint, was analysed in the EuroMDS population and validated in the other three cohorts. Finally, novel prognostic models incorporating sex and genomic information were built and validated, and compared to the widely used revised International Prognostic Scoring System (IPSS-R). This study is registered with ClinicalTrials.gov, NCT04889729. Findings The study included 7792 (58middot7%) men and 5492 (41middot3%) women. 10 906 (82middot1%) patients were White, and race was not reported for 2378 (17middot9%) patients. Sex biases were observed at the single-gene level with mutations in seven genes enriched in men (ASXL1, SRSF2, and ZRSR2 p<0middot0001 in both cohorts; DDX41 not available in the EuroMDS cohort vs p=0middot0062 in the IWG-PM cohort; IDH2 p<0middot0001 in EuroMDS vs p=0middot042 in IWG-PM; TET2 p=0middot031 vs p=0middot035; U2AF1 p=0middot033 vs p<0middot0001) and mutations in two genes were enriched in women (DNMT3A p<0middot0001 in EuroMDS vs p=0middot011 in IWG-PM; TP53 p=0middot030 vs p=0middot037). Additionally, sex biases were observed in co-mutational pathways of founding genomic lesions (splicing-related genes, predominantly in men, p<0middot0001 in both the EuroMDS and IWG-PM cohorts), in DNA methylation (predominantly in men, p=0middot046 in EuroMDS vs p<0middot0001 in IWG-PM), and TP53 mutational pathways (predominantly in women, p=0middot0073 in EuroMDS vs p<0middot0001 in IWG-PM). In the retrospective EuroMDS cohort, men had worse median overall survival (81middot3 months, 95% CI 70middot4-95middot0 in men vs 123middot5 months, 104middot5-127middot5 in women; hazard ratio [HR] 1middot40, 95% CI 1middot26-1middot52; p<0middot0001). This result was confirmed in the prospective validation cohorts (median overall survival was 54middot7 months, 95% CI 52middot4-59middot1 in men vs 74middot4 months, 69middot3-81middot2 in women; HR 1middot30, 95% CI 1middot23-1middot35; p<0middot0001 in the GEMSD MDS registry; 40middot0 months, 95% CI 33middot4-43middot7 in men vs 54middot2 months, 38middot6-63middot8 in women; HR 1middot23, 95% CI 1middot08-1middot36; p<0middot0001 in the Dusseldorf MDS registry). We developed new personalised prognostic tools that included sex information (the sex-informed prognostic scoring system and the sex-informed genomic scoring system). Sex maintained independent prognostic power in all prognostic systems; the highest performance was observed in the model that included both sex and genomic information. A five-to-five mapping between the IPSS-R and new score categories resulted in the re-stratification of 871 (43middot0%) of 2025 patients from the EuroMDS cohort and 1003 (42middot0%) of 2387 patients from the IWG-PM cohort by using the sex-informed prognostic scoring system, and of 1134 (56middot0%) patients from the EuroMDS cohort and 1265 (53middot0%) patients from the IWG-PM cohort by using the sex-informed genomic scoring system. We created a web portal that enables outcome predictions based on a sex-informed personalised approach. Interpretation Our results suggest that a sex-informed approach can improve the personalised decision making process in patients with myelodysplastic syndromes and should be considered in the design of clinical trials including low-risk patients. Copyright (c) 2022 Published by Elsevier Ltd. All rights reserved

    Development of machine learning methods for multi-modal biomarkers detection and integration

    No full text
    In medicine, innovation depends on a better knowledge of the human body mechanism, which represents a complex system of multi-scale constituents. Unraveling the complexity underneath diseases proves to be challenging. A deep understanding of the inner workings comes with dealing with many heterogeneous information. Exploring the molecular status and the organization of genes, proteins, metabolites provides insights on what is driving a disease, from aggressiveness to curability. Molecular constituents, however, are only the building blocks of the human body and cannot currently tell the whole story of diseases. This is why nowadays attention is growing towards the contemporary exploitation of multi-scale information. Holistic methods are then drawing interest to address the problem of integrating heterogeneous data. The heterogeneity may derive from the diversity across data types and from the diversity within diseases. Here, four studies conducted data integration using customly designed workflows that implement novel methods and views to tackle the heterogeneous characterization of diseases. The first study devoted to determine shared gene regulatory signatures for onco-hematology and it showed partial co-regulation across blood-related diseases. The second study focused on Acute Myeloid Leukemia and refined the unsupervised integration of genomic alterations, which turned out to better resemble clinical practice. In the third study, network integration for artherosclerosis demonstrated, as a proof of concept, the impact of network intelligibility when it comes to model heterogeneous data, which showed to accelerate the identification of new potential pharmaceutical targets. Lastly, the fourth study introduced a new method to integrate multiple data types in a unique latent heterogeneous-representation that facilitated the selection of important data types to predict the tumour stage of invasive ductal carcinoma. The results of these four studies laid the groundwork to ease the detection of new biomarkers ultimately beneficial to medical practice and to the ever-growing field of Personalized Medicine

    Xavier Barral i Altet, Guido Dall’Olio e Daniele Manacorda discutono "Storie per tutti" / Xavier Barral i Altet, Guido Dall’Olio and Daniele Manacorda talk about “Storie per tutti” (Stories for everyone)

    No full text
     Si pubblica qui, in una versione rielaborata dagli autori, il testo degli interventi di Xavier Barral i Altet, Guido Dall’Olio e Daniele Manacorda, proposti il 5 giugno 2014 in occasione della presentazione presso la Biblioteca di Storia moderna e contemporanea di Roma del numero de «Il Capitale culturale. Studies on the Value of Cultural Heritage» dedicato al rapporto fra ricerca e diffusione del sapere.    We publish here the speeches, reviewed by the authors Xavier Barrall i Altet, Guido Dall’Olio and Daniele Manacorda, presented in June, 5 2014 at the Biblioteca di Storia moderna e contemporanea in Roma for the monographic issue of «Il Capitale culturale. Studies on the Value of Cultural Heritage» dedicated to the relation between the research and the diffusion of knowledge.   

    The BovMAS Consortium: investigation of bovine chromosome 14 for quantitative trait loci affecting milk production and quality traits in the Italian Hoilstein Friesian breed

    No full text
    Many studies have demonstrated that quantitative trait loci (QTL) can be identified and mapped in commercial dairy cattle populations using genetic markers in daughter and granddaughter designs.The final objective of these studies is to identify genes or markers that can be used in breeding schemes via marker assisted selection (MAS)

    Heterogeneity of Cellular Senescence: Cell Type-Specific and Senescence Stimulus-Dependent Epigenetic Alterations

    No full text
    The aim of the present study was to provide a comprehensive characterization of whole genome DNA methylation patterns in replicative and ionizing irradiation- or doxorubicin-induced premature senescence, exhaustively exploring epigenetic modifications in three different human cell types: in somatic diploid skin fibroblasts and in bone marrow- and adipose-derived mesenchymal stem cells. With CpG-wise differential analysis, three epigenetic signatures were identified: (a) cell type- and treatment-specific signature; (b) cell type-specific senescence-related signature; and (c) cell type-transversal replicative senescence-related signature. Cluster analysis revealed that only replicative senescent cells created a distinct group reflecting notable alterations in the DNA methylation patterns accompanying this cellular state. Replicative senescence-associated epigenetic changes seemed to be of such an extent that they surpassed interpersonal dissimilarities. Enrichment in pathways linked to the nervous system and involved in the neurological functions was shown after pathway analysis of genes involved in the cell type-transversal replicative senescence-related signature. Although DNA methylation clock analysis provided no statistically significant evidence on epigenetic age acceleration related to senescence, a persistent trend of increased biological age in replicative senescent cultures of all three cell types was observed. Overall, this work indicates the heterogeneity of senescent cells depending on the tissue of origin and the type of senescence inducer that could be putatively translated to a distinct impact on tissue homeostasis
    corecore